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1. INTRODUCTION 

There are many evolutions, from black-and-white Television to colored Television, and High- 
Definition Television is now the most popular Television in the market, from the birth of television. The 
development of 3D Television system has grabbed more and more attention after that. 

Depth estimation or extraction is a measure of the distance of, ideally, each point of the seen scene 
to the Stereo Vision research field. In absence of the information for absolute depth measurements such as 
motion, defocus, binocular disparity, the absolute distance between the observer or camera and the objects in 
the scene or image cannot be measured. The depth estimation is very important aspect in various 
applications. So, when the distance of the cameras and some camera parameters when changed, then the 
depth estimation is useful to calculate exact distance of camera from image in both the cases as discussed 
in [8]. The information of shading, edges and junctions may provide 3-D model but it will not give the exact 
scale of the space. 

There are many ways to retrieve 3D information. The most common and straightforward method is 
to use “active device” 1.e. active cameras, which can capture the original image of the scene and detect the 
depth of each of pixels of the scene simultaneously. The examples are infrared cameras, sonar cameras, etc. 
However, most of the active devices need additional sensor to obtain depth information, so they are generally 
more expensive compared to common cameras. Whereas, since they have specific designed sensors, most of 
the times the detected depth maps are more accurate. In addition to active device, we can also use the 
“passive device’, which requires some post processing after the acquisition of the scene of interest, such as 
binocular cameras and common cameras are with focus tuning functionality. Compared to active devices, 
passive devices are much cheaper, and without the limitation of sensor energy issue because of the active 
devices, passive devices can generate a higher resolution of depth map. 


Journal homepage: http:/iaescore.com/online/index.php/IJAAS 


IJAAS ISSN: 2252-8814 o 221 


In the area of passive methods, single common camera is the most preferred camera since it can be 
acquired from general markets and it is easy to use for most of the people. However, the lack of information 
from additional sensors like the active devices or different angles of view from additional camera, the 
binocular cameras make the calculation of depth map from a single image very difficult. To compensate this 
defect, a series of images of a scene can be captured on different focus planes to gather more information 
than just a single image alone. The result in the previous works turns out to be better than just a single image 
alone and many algorithms are even suitable for future hardware implementation for real time application. 

Also, Depth perception from stereo vision is based on the triangulation principle [9]. We can use 
two cameras with projective optics and arrange them side by side, such that their view fields overlap at the 
desired object distance. By taking a picture with each camera, we can capture the scene from two different 
viewpoints. The above scope considers focused images. Depth from Defocus is the challenge in perception of 
accurate depth as all the objects in the scene are not focused always [10]. The camera positions, light 
intensity and focal lengths of camera may vary which yield in blur images. So, in this paper, survey on 
Depth from Defocus is done from the above motivation. 

The remaining part of this paper is arranged as follows: Section I describes some literature survey 
that has done in the field of depth from images. Section III presents some experimental results of the 
literature. Section IV gives the conclusion. 


2. LITERATURE SURVEY 

In a natural image, objects on the focus plane are sharper than those out of focus due to the depth of 
the objects. Those works or algorithms that analyze the sharpness or blur of an object as their information of 
depth are categorized as “Depth From Focusing” or “Depth From Defocusing” algorithms. 

Either “Depth From Focusing” or “Depth From Defocusing” algorithm needs focus measure which 
finds out the sharpness of an object to determine its depth. There are plenty of ways to measure the 
sharpness, either in spatial domain or frequency domain. All of them are just different aspects of some sort of 
high pass filters in different point of view. 

Tayebeh Rajabzadeh, Abedin Vahedian, in their paper [1], were introduced a new method which 
used similar characteristics of defocus blur. It was found out in paper [1] that the change in object distance 
from the camera has direct relation with the amount of defocus blur in the image. The proposed method [1] 
compared to conventional defocus and other methods, was shown to be a blind method. i.e. no focus state 
object image is required. Another advantage of this method [1] is that it is independent from the 3-D 
attributes of objects or the scene. The complexity is also less compared to similar methods.. 

The authors Cassandra Swain, alan Peters, and kazuhiko kawamura, in paper [2], improved the 
accuracy of depth from defocus using Fuzzy Logic Technique. Fuzzy Logic in [1] is combined with a depth 
from Defocus technique to correct for uncertainty and imprecision in depth estimation. 

Cassandra Swain, Alan Peters, and Kazuhiko Kawamura [2] have given two inputs to fuzzy 
algorithm are focus quality and focal error. Focus quality is a measure of the amount of defocus in the image. 
Experimental results in [2] show that fuzzy logic significantly improves depth estimation compared to 
nonfuzzy depth from defocus. 

Junlan Yang, Dan Sconfeld, in [3], presented a novel method for virtual focus and object depth 
estimation from defocused video captured by a moving camera. They used the term virtual focus to refer to a 
new approach in [3] for producing in-focus image sequences by processing blurred videos captured by out- 
of-focus cameras. The method used in paper [3] relies on the concept of Depth-from-Defocus (DFD). The 
authors explored several blur models which can be used to recover arbitrary transfer functions. 

Sangjin Kim, Eunsung Lee, Monson H. Hayes’s research work [4] uses a novel approach to depth 
estimation using a multiple color-filter aperture (MCA) camera and its application to multifocusing. An 
Image acquired by the MCA camera in [4] contains spatially varying misalignment among RGB color 
channels, where the direction and length of the misalignment is a function of the distance of an object from 
the plane of focus. Therefore, if the misalignment is estimated from the MCA output image in [4], 
multifocusing and depth estimation become possible using a set of image processing algorithms. 

The MCA camera in [4] with proposed image processing algorithms enables automatic, 
computationally efficient multifocusing using a three-step process that involves: (1) image segmentation for 
classifying clusters, (11) color shift modelbased registration and fusion, and (111) image restoration. More 
specifically, an image acquired by the MCA camera is first segmented into multiple clusters, each of which 
has the uniform color, and then the corresponding rectangular region is generated that encloses each cluster. 
The MCA camera significantly enhance the visual quality of an image containing multiple objects of 
different distances [4]. In the paper [5] by Wided Miled, Jean-Christophe Pesquet and Michel Parent, they 
presented a new method for addressing robust depth estimation from a stereo pair under varying illumination 
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conditions. First, a spatially varying multiplicative model is developed in [5], to account for brightness 
changes induced between left and right views. The depth estimation problem, based on this model [5], is then 
formulated as a constrained optimization problem in which an appropriate convex objective function is 
minimized under various convex constraints modeling prior knowledge and observed information. 

The resulting multiconstrained optimization problem in [5] is finally solved via a parallel block 
iterative algorithm which offers great flexibility in the incorporation of several constraints. Experimental 
results on both synthetic and real stereo pairs [5] demonstrate the good performance of our method to 
efficiently recover depth and illumination variation fields, simultaneously. The authors in paper [5] have 
proposed a convex programming approach for the problem of disparity estimation in the presence of 
illumination variations.In a paper by C. Paramanand and A. N. Rajagopalan [6], the objective was to recover 
the 3-D structure of a scene from motion blur/optical defocus. In the proposed approach [6], the difference of 
blur between two observations is used as a cue for recovering depth, within a recursive state estimation 
framework. 

For motion blur in [6], the authors used an unblurred (focused)—blurred image pair. Since the 
relationship between the observation and the scale factor of the point spread function associated with the 
depth at a point is nonlinear, they proposed and developed a formulation of unscented Kalman filter for depth 
estimation. 

Depth estimation from a single image is a challenging problem in computer vision research [7]. By 
analyzing the defocus cues produced by the depth of field of lens, the information of depth can be 
determined. Patrick P. K. Chan, Bing-Zhong Jing, Wing W. Y. Ng, Daniel S. Yeung in their wok [7] 
employed reverse heat equation, which is simple and effective, for this analysis. 

Because the depth map is required to be smooth in many applications, a mean shift segmentation 
and graph cut based method is proposed in [7] to infer the depth map of the scene. The confidence of depth 
estimation is incorporated into the energy function of graph cut to preserve details of the depth map [7]. 
Experimental results [7] show that the proposed method can produce a good depth map even from a single 
image. 


3. EXPERIMENTAL RESULTS OF LITERATURE SURVEY 

The authors in paper [3] test PTF estimation with a sequence ALARM as shown in Figure 1; (a) 
shows the first frame and (b) shows the blurred first frame as a result of a synthetic OTF consisting of a 
Gaussian MTF and an arbitrary PTF. Figure 1 (c) shows the reconstruction result using Restoration from 
Magnitude (RFM). It is a technique [3] based on projection onto convex set (POCS), while the two convex 
sets are the set of space-limited functions and the set of all functions that have a Fourier transform magnitude 
equal to a prescribed function. Figure 1 (d) shows the reconstruction result using only proposed OTF 
magnitude estimation [3] and PTF is considered to be zero. Figure 1 (e) shows the reconstruction result using 
both proposed OTF magnitude estimation and phase estimation [3]. It can be seen that the restoration 
including PTF estimation performs better than the restoration without phase and restoration using RFM [3]. 
Figure 2 shows the two different single-aperture models. 





Figure 1. Comparison of focused image reconstruction methods for video ALARM: [3] (a) original image; 
(b) blurred image, (c) focused image reconstruction using restoration from magnitude (RFM), 
(d) focused image reconstruction using blur function magnitude estimation, 
(e) focused image reconstruction using blur function magnitude and phase estimation [3] 
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Figure 2. Two different single-aperture models [4]. (a) Aperture is aligned on the optical axis of the camera (b) Aperture 
is shifted away from the optical axis, which produces various convergence positions according 
to the distance of an object 


As shown in Figure 3 (b), the single-eye range method mostly failed to estimate depth of the 
defocused scene background in [4]. On the other hand, the proposed depth estimation method [4] produces 
the best result as shown in Figure 3 (c). 








Figure 3. Comparison of depth estimation of three different algorithms [4]. (a) Input image acquired by the MCA camera 
(b) Result of depth estimation using the single-eye range method (c) Result of the proposed depth estimation algorithm 
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Figure 4. Results on the Shrub stereo pair. Estimated disparity map using [5] (a) SD with affine illumination. (b) 
Normalized cross correlation. (c) GC with istogram transform. (d) MRF with rank transform. (e) SGM algorithm. (f) 
Proposed method 


From Figure 4, it is noticed that local methods give noisy results and are very sensitive to 
illumination changes [5], while the SGM algorithm and the proposed method in [5] allow obtaining a smooth 
disparity map with sharp depth discontinuities. Both GC and MRF algorithms combined with a histogram 
and rank transform, respectively in [5], also show good performance for this stereo pair, due to the presence 
of large homogeneous regions. 
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In Figure 5, the proposed method [6] has the additional advantage that it can be applied to DFD 
without constraining the PSF to be Gaussian. By enforcing sparsity constraint [6], the authors also addressed 
the problem of depth estimation when an observation undergoes simultaneous motion and optical blur. 





Figure 5. (a) Reference image (b) Simultaneously blurred observation (c) Known optical blur (d) Estimated 
motion kernel (e) Estimated depth by proposed method [6] 





(a) 


Figure 6. (a) The images of Middlebury Stereo Datasets; (b) The ground truth depth map; (c) The depth map estimated by 
proposed method [7] 


The experimental results in the paper [7] in Figure 6 show that this depth estimation technique is 
reliable. In this paper [7], authors illustrate passive depth estimation method to extract depth map from a 
single image captured in a narrow depth of field setting. This method [7] employs reverse heat equation for 
pre-process and use the proposed hierarchy mean shift segmentation and graph cut with a confidence to infer 
the depth map. 

From above literature survey, it has been observed that the defocused or blur image and the Depth 
estimated from it, is a big challenge. But if focal length, focal error of the lens is known and the intensity of 
light is also known, then by various approaches, depth map can be estimated to reconstruct 3D view from 
that. 


4. CONCLUSION 
Focus quality is a measure of the amount of defocus in the image. Depth from Defocus or Depth from 
Motion Blur is a challenge. 
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